0.1 Data Story

The impact of the covid-19 pandemic on electricity consumption in the different suburbs of Sydney As covid-19 strikes Australia, our reliance on digital equipments especially remote streaming apps has increased drastically.So, the hypothesis is the usage of digital equipments has led to a increase in electricity consumption.

Taking a quick view of the data

# For the 2019 data set
dim(data2019)
## [1] 39 17
head(data2019, 10)
## # A tibble: 10 × 17
##    ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11 ...12 ...13
##    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
##  1 <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA> <NA>  <NA>   <NA>
##  2 Loca… Resi… <NA>  <NA>  <NA>  <NA>  <NA>  Solar <NA>   <NA> <NA>  <NA>  "Non…
##  3 <NA>  Dail… MWh   <NA>  <NA>  Cust… <NA>  Numb… <NA>  "Gen… <NA>  Ener… "MWh"
##  4 <NA>  <NA>  Gene… Off … Total Off … Total Res   Non-… "Res… Non-… <NA>   <NA>
##  5 BAYS… 12.4… 2756… 2947… 3051… 1552… 6697… 2842  190   "917… 3900… 5848… "136…
##  6 BURW… 12.8… 6442… 3814… 6824… 2241… 1455… 684   35    "222… 631.… 1434… "414…
##  7 CANA… 13.4… 1784… 8682… 1870… 5259… 3805… 1601  85    "588… 3572… 3427… "678…
##  8 CANT… 15.2… 6444… 8135… 7258… 3701… 1299… 9947  561   "320… 1613… 2038… "231…
##  9 CENT… 16.4… 7375… 1699… 9074… 8133… 1510… 21015 674   "754… 1918… 5525… "222…
## 10 CESS… 18.9… 1451… 2206… 1671… 1076… 2412… 4329  204   "194… 4744… 1192… "431…
## # … with 4 more variables: ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <chr>
# For the 2020 data set
dim(data2020)
## [1] 41 17
head(data2020, 10)
## # A tibble: 10 × 17
##    ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11 ...12 ...13
##    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
##  1 Regi… Loca… Resi… <NA>  <NA>  <NA>  <NA>  <NA>  Solar <NA>   <NA> <NA>  <NA> 
##  2 <NA>  <NA>  Dail… MWh   <NA>  <NA>  Cust… <NA>  Numb… <NA>  "Gen… <NA>  Ener…
##  3 <NA>  <NA>  <NA>  Gene… Off … Total Off … Total Res   Non-… "Res… Non-… <NA> 
##  4 Sydn… BAYS… 12.4  2778… 28618 3064… 15314 67909 3134  212   "113… 4871  7584 
##  5 <NA>  BURW… 12.6  63676 3638  67314 2196  14600 753   38    "273… 674   1696 
##  6 <NA>  CANA… 13.3  1778… 8305  1861… 5075  38223 1805  99    "729… 3950  4406 
##  7 <NA>  CANT… 15.1  6416… 79006 7206… 36476 1310… 10900 631   "389… 17862 26313
##  8 <NA>  CUMB… 13.1  1019… 8697  1106… 4027  23149 1908  129   "619… 4167  4519 
##  9 <NA>  GEOR… 14.6  2699… 39761 3097… 19182 58079 4351  181   "159… 3850  9931 
## 10 <NA>  HORN… 18.2  3013… 49437 3507… 20861 52676 6789  180   "282… 4519  16246
## # … with 4 more variables: ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <chr>
# For the 2023 data set
dim(data2023)
## [1] 41 17
head(data2023, 10)
## # A tibble: 10 × 17
##    ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11 ...12 ...13
##    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
##  1 Regi… Loca… Resi… <NA>  <NA>  <NA>  <NA>  <NA>  Solar <NA>   <NA> <NA>  <NA> 
##  2 <NA>  <NA>  Dail… MWh   <NA>  <NA>  Cust… <NA>  Numb… <NA>  "Gen… <NA>  Ener…
##  3 <NA>  <NA>  <NA>  Gene… Off … Total Off … Total Res   Non-… "Res… Non-… <NA> 
##  4 Sydn… BAYS… 12.6… 2882… 2733… 3155… 14792 68610 4381  279   "211… 6862… 1605…
##  5 <NA>  BURW… 12.4… 6441… 3376… 6779… 2101  14861 995   52    "469… 988.… 3398…
##  6 <NA>  CANA… 13.5… 1830… 7881… 1909… 4765  38740 2490  117   "131… 4284… 9463…
##  7 <NA>  CANT… 15.0… 6567… 7574… 7325… 35055 1334… 14679 823   "696… 2682… 5498…
##  8 <NA>  CUMB… 12.8… 1012… 8201… 1094… 3853  23410 2550  179   "115… 6067… 9777…
##  9 <NA>  GEOR… 14.6… 2772… 3853… 3157… 18522 59056 6046  234   "307… 6387… 2171…
## 10 <NA>  HORN… 18.4… 3095… 4985… 3594… 20435 53315 9222  222   "495… 6107… 3195…
## # … with 4 more variables: ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <chr>

Some useful functions

filter_int <- function(data){
  integers <- as.numeric(grep("\\d+", data, value = TRUE))
  max_index <- which(integers == max(integers))
  return(integers[-max_index])
}
ex_cwords <- function(words) {
  capital_words <- words[str_detect(words, "^([A-Z]+)$")]
  return(capital_words)
}
wrap.it <- function(x, len)
{ 
  sapply(x, function(y) paste(strwrap(y, len), 
                              collapse = "\n"), 
         USE.NAMES = FALSE)
}

wrap.labels <- function(x, len)
{
  if (is.list(x))
  {
    lapply(x, wrap.it, len)
  } else {
    wrap.it(x, len)
  }
}

0.2 IDA

To get an idea of the new data, we will do a quick bargraph of the ‘general supply’ columns from the excel file. (It is with several trial and error, that we have found out the original excel file is not suitable for R data analysis, therefore manual excel operation was done to extract it out into a form which was useful.)

library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
options(scipen = 999)
extracted_data <- read_excel("data/Electricity_consumption(extracted).xlsx")
extracted_data <- arrange(extracted_data)

# Setting up the output graph
par(mar = c(5, 8, 0, 2), mgp = c(4, 1, 0))

# The 2019 Dataset(From general supply)
values2019 <- extracted_data$`2019-R`
suburbs2019 <- extracted_data$H2019
barplot(values2019, names.arg=wrap.labels(rev(suburbs2019), 20), 
        col="red", xlab=" Electricity consumption/mgh", horiz=T, las=2, cex.names = 0.7)

# The 2020 Dataset
values2020 <- extracted_data$`2020-R`
suburbs2020 <- extracted_data$H2020
barplot(values2020, names.arg=wrap.labels(rev(suburbs2020), 20), 
        col="green", xlab=" Electricity consumption/mgh", horiz=T, las=2, cex.names = 0.7)

# The 2023 Dataset
values2023 <- na.omit(extracted_data$`2023-R`)
suburbs2023 <- na.omit(extracted_data$`H-2023`)
barplot(values2023[1:14], names.arg=wrap.labels(rev(suburbs2023), 20), 
        col="blue", xlab=" Electricity consumption/mgh", horiz=T, las=2, cex.names = 0.7)

Remark 1: (There’s some value in the third dataset that is missing, which meant that the totals from each year is not investigated, as that would yield misconceptions)